FISHER INFORMATION AND THE CENTRAL LIMIT THEOREM 



S. G. BOBKOVi ", G. P. GHISTYAKOV^ '*, AND F. GOTZE^ ^ 

Abstract. An Edgeworth-type expansion is established for the relative Fisher infor- 
mation distance to the class of normal distributions of sums of i.i.d. random variables, 
satisfying moment conditions. The validity of the central limit theorem is studied via 
properties of the Fisher information along convolutions. 



1. Introduction 

Given a random variable X with an absolutely continuous density p, the Fisher infor- 
mation of X (or its distribution) is defined by 



/(X) = I{p) = J 



p{x) ' 



where p' denotes a Radon-Nikodym derivative of p. In all other cases, let I{X) = +00. 

With the first two moments of X being fixed, this quantity is minimized for the 
normal distribution (which is a variant of Cramer- Rao's inequality). That is, if EX = a, 
Var(X) = then we have J(X) > I{Z) for Z ~ N{a, a^) with density 

Moreover, the equality /(X) = I{Z) holds if and only if X is normal. 
In many applications the relative Fisher information 

which is used as a strong measure of non-Gaussianity of X. For example, it dominates 
the relative entropy, or Kullback-Leibler distance of the distribution of X to the standard 
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normal distribution; more precisely (cf. Stam [S]), 



^-I{X\\Z) > D{X\\Z) = / ^pix)\og-^ dx. (1.1) 

We consider the scheme of a sequence of sums of independent identically distributed 
random variables {Xn)n>i- Assuming that EXi = 0, Var(Xi) — 1, define the normalized 
sums 

X, + --- + X^ 



Since Zn are weakly convergent in distribution to Z ~ A'"(0, 1), one may wonder whether 
the convergence holds in a stronger sense. A remarkable observation in this respect is 
due to Barron and Johnson proving in [B-J] that 

I{Zr,)^I{Z), asn^oo, (1.2) 

i.e., I{Zn\\Z) 0, if and only if I{Zr,„) is finite for some ng. In particular, it suffices to 
require that I{Xi) < +oo, although choosing larger values of no considerably enhances 
the range of applicability of this theorem. 

Quantitative estimates on the relative Fisher information in the central limit theorem 
are partly developed, as well. In the i.i.d. case Barron and Johnson [B-J], and Artstein, 
Ball, Barthe and Naor [A-B-B-Nl] derived an asymptotic bound I{Zn\\Z) = 0{l/n) un- 
der the hypothesis that the distribution of Xi admits an analytic inequality of Poincare- 
type (cf. also [J]). Poincare inequalities involve a large variety of "nice" probability 
distributions on the line all having finite exponential moments. 

One of the aims of this paper is to study the exact asymptotics (or rates) of I{Zn\\Z) 
under standard moment conditions. We prove: 

Theorem 1.1. Let E < +oo for an integer s > 2, and assume I{Zno) < +oo, 
for some uq. Then for certain coefficients cj we have, as n ^ oo, 



As it turns out, a similar expansion holds as well for the cntropic distance D{Zn\\Z), 
cf. [B-C-G2], showing a number of interesting analogies in the asymptotic behavior of 
these two distances. In particular, in both cases each coefficient Cj is given by a certain 
polynomial in the cumulants 73, ... , 72^+1 ■ 

In order to describe these polynomials, we first note that, by the moment assumption, 
the cumulants 

7. = r^-^logEe^*^^|,=o 
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are well-defined for all positive integers r < s, and one may introduce the well-known 
functions 

q,{x) = ^(x) Y: Hu,.A-) (I) ... {j^) 

involving the Chebyshev-Hermite polynomials Hk- Here ip = y^o,! denotes the density of 
the standard normal law, and the summation runs over all non-negative integer solutions 
(ri, . . . , Tfc) to the equation ri + 2r2 + • • • + kvk = k with j = ri + ■ ■ ■ + Vk- 

The functions are correctly defined for /c = 1, . . . , s— 2. They appear in Edgeworth- 
type expansions approximating the density of We shall employ them to derive an 
expansion in powers of 1/n for the distance which leads us to the following 

description of the coefficients in (1.3), 

Cj = {-if / (g;^ + xqr^){q^^ + ^rg • • • (Iru (1-4) 

fe=2 ^ 



Here, the inner summation is carried out over all positive integer tuples (ri, . . . , r^) such 

that ri + ■ • ■ + Tfc = 2j. 

For example, Ci = and in the case s = 4 (1.3) becomes 

I{Z^\\Z) = ^ (EXf)' + o f — . (1.5) 

Hence, under the 4-th moment condition, we have 1{Z^Z) < ^ with some constant C 
(which can actually be chosen to depend on EX^ and /(^i), only). 

For s = 6, the result involves the coefficient C2 which depends on 73,74, and 75. If 
73 = (i.e. EXf = 0), we have ci = 0, C2 = 1 7I, and then 

,(Z„||Z)^J^(EX?-3)^ + „(-,^). 

More generally, the representation (1.3) simplifies, if the first k — \ moments of Xx 
coincide with the corresponding moments of Z ~ A^(0, 1). 

Corollary 1.2. Let E < +00 (s > 4), and assume I{Zno) < +00, for some no. 
Given k — 3,A, . . . , s, assume that 7^ = for all 3 < j < k. Then 

^^^'^"^^ = ' ^ ^ ^(^) ^ ''(n(-2)/2 (logn)(-3)/2)- (1-6) 

This relation is consistent with an observation of Johnson who noticed that if 7fc 7^ 0, 
I{Zn\\Z) cannot be asymptotically better than n"*^*^"^-* ([J], Lemma 2.12). 

Note that if A; < |, the O-term in (1.6) dominates the o-term. But when A; > | it 
can be removed, and if A; > | -|- 1, (1.6) just says that 

I{Zr.\\Z) = o(n-(-^)/^ (logn)-M/^). (1.7) 



4 



S. G. Bobkov, G. P. Chistyakov and F. Gotze 



For the values s = 2, 3 there are no coefficients cj in the sum (1.3). In case s = 2 
Theorem 1.1 reduces to Barron-Johnson's theorem (1.2), while under a 3-rd moment 
assumption we only have 

/(Z„||Z) = o(-i=). 

A similar observation holds for the whole range of reals 2 < s < 4. Here the expansion 
(1.3) should be replaced by the bound (1.7). Although this bound is worse than (1.5), 
it cannot be essentially improved. As shown in [B-C-G2], it may happen that E < 
-l-oo with D{Xi) < +00 (in fact, with I{Xi) < +oo), while 

D{Zn\\Z) > -, n > n,{X,), 

77, (,s ^)/^ (logn)'' 

where the constant c > depends on s and an arbitrary prescribed value r] > s/2. In 
view of (1.1), a similar lower bound therefore holds for I{Zn\\Z), as well. 

Another interesting issue connected with the convergence theorem (1.2) and the ex- 
pansion (1.3) is the characterization of distributions for which these results hold. Indeed, 
the condition I{Xi) < +oo corresponding to no = 1 in Theorem 1.1 seems to be way 
too strong. To this aim, we establish an explicit criterion such that I{Zno) < +oo holds 
for sufficiently large tiq in terms of the characteristic function /i (t) = E e**"^^ of Xi . 

Theorem 1.3. Given independent identically distributed random variables {Xn)n>i 
with finite second moment, the following assertions are equivalent: 

a) For some Uq, Z„„ has finite Fisher information; 

b) For some Uq, has density of bounded total variation; 

c) For some uq, has a continuously differentiable density such that 

/ + 00 
-oo 

d) For some e > 0, \fi{t)\ = 0(t~^), as t ^ +oo; 

e) For some u > Q, 

/+00 
\f,{t)\'\t\dt<+oo. (1.8) 
-oo 

Property c) is a formally strengthened variant of 6), although in general they are 
not equivalent. (For example, the uniform distribution has density of bounded total 
variation, but its density is not everywhere differentiable.) 

Properties a) — c) are equivalent to each other without any moment assumption, 
while d) — e) are always necessary for the finiteness of I{Zn) with large n. These two 
last conditions show that the range of applicability of Theorem 1.1 is indeed rather wide, 
since almost all reasonable absolutely continuous distributions satisfy (1.8). The latter 
should be compared to and viewed as a certain strengthening of the following condition 
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(sometimes called a smoothness condition) 

/+00 
\fi{t)\'' dt < +00, for some u > 0. 
-00 

It is equivalent to the property that, for some no, Z^^ has a bounded continuous density 
p„g (cf. e.g. [BR-R]). In this and only in this case, a uniform local limit theorem holds: 
A„ = sup^ |Pn(3^) ~ ^ 0, as n ^ 00. That this assertion is weaker compared to 

the convergence in Fisher information distance such as (1.2) can be seen by Shimizu's 
inequality < c/(Z„||Z), which holds with some absolute constant c ([Sh], [B-J], 
Lemma 1.5). Note in this connection that Shimizu's inequality may be strengthened in 
terms of the total variation distance as — v^||xv — c/(Z„||Z). Using Theorem 1.3, 
this shows that (1.2) is equivalent to the convergence ||pn — V'Htv ^ 0. 

The paper is organized in the following way. We start with the description of general 
properties of densities having finite Fisher information (Section 2) and properties of 
Fisher information as a functional on spaces of densities (showing lower semi-continuity 
and convexity. Section 3). Some of the properties and relations which we state for 
completeness may be known already. We apologize for being unable to find references 
for them. 

In Sections 4-5 we turn to upper bounds needed mainly in the proof of Theorem 1.3. 
Further properties of densities emerging after several convolutions, as well as, bounds 
under additional moment assumptions are discussed in Sections 6-8. In Section 9 we 
complete the proof of Theorem 1.3, and in the next section we state basic lemmas on 
Edgcworth-typc expansions which are needed in the proof of Theorem 1.1. Sections 
11-12 arc devoted to the proof itself. Some remarks leading to the particular case 
s = 2 in Theorem 1.1 (Barron- Johnson theorem) are given in Section 13. Finally, in the 
last section we briefly describe the modifications needed to obtain Theorem 1.1 under 
moment assumptions with arbitrary real values of s. 
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2. General properties of densities with finite Fisher information 

If a random variable X has density p with finite Fisher information 

p has to be absolutely continuous, and then the derivative p'{x) exists and is finite on a 
set of full Lebesgue measure. 

One may write an equivalent definition by involving the score function p(x) = In 
general P{p{X) > 0} = 1, so the random variable p{X) is well defined with probability 1, 
and thus 

7(X)=Ep(X)2. (2.2) 

However, strictly speaking, the integration in (2.1) should be restricted to the open set 
{x : p{x) > 0}. 

For different purposes, it is useful to realize how the ratio may behave when 

p{x) is small and is even vanishing. The behavior cannot be arbitrary, when the Fisher 
information is finite. The following statement plays a "justifying" role in obtaining of 
many Fisher information bounds on the density and its derivatives. 

Proposition 2.1. Assume X has density p with finite Fisher information. If p is 
dijjerentiable at the point Xq such that p{xo) — 0, then p'{xo) — 0. 

Proof. If p is differentiable in some neighborhood of Xo and its derivative is contin- 
uous at this point, the statement is obvious. 

To cover the general case, for simplicity of notations let Xq = and assume that 
c = p'(0) > 0. Since p{e) = ce + o{s), as £ ^ 0, one may choose Sq > such that 

3c 5c 

— \x\ < pix) < — \x\, lor all < \x\ < Eq. 
4 4 

In particular, p is positive on (0, £o]- Hence, by the definition (2.1), 



Jo p[x) 5c Jo X 



We split the last integral into the intervals A„ = (2 *^""'""^^£o, 2 ^Sq) and then estimate 
p{x) from above on each of them, which leads to 



^""^"/(X) > f^2" /" p\xfdx. 

n=0 "^^i 
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Now, applying Cauchy's inequality and using p{x) — p(f ) > ior < x < Eq, we 
obtain 



/ p'ixfdx > 2"+^(' [ p'{x)dx^ 

J An \Ja„ J 



2 o-(n+l) ('^^o)^ 



64 



As a result, 



^ /(X) > y 2^^ • 2-("+i) • = +00, 

4 ^ ^ - 64 

n=0 



a contradiction with finiteness of the Fisher information. Proposition 2.1 is proved. 

As an example illustrating a possible behavior as in Proposition 2.1, one may consider 
the beta distribution with parameters a — P — 3, which has density 

p{x) = 30 - x) f, < a; < 1. 

Then X has finite Fisher information, although p{xo) = p'{xo) = aX xq = and Xq = 1. 

More generally, if a density p is supported and twice differentiable on a finite interval 
[a, b], and if p has finitely many zeros xq G [a, b], and p'{xo) — 0, p"{xq) > at any such 
point, then X has finite Fisher information. 

Now, let us return to the definitions (2.1)-(2.2). By Cauchy's inequality, 

= (Ep(X)2)'/' > E|p(X)| = / \p\x)\dx. 

J{p{x)>Q} 

Here, by Proposition 2.1, the last integral may be extended to the whole real line without 
any change, and then it represents the total variation of the function p in the usual sense 
of the Theory of Functions: 

n 

IIpIItv = sup ^ \p{xk) -p{xk-i)l 

k=l 

where the supremum runs over all finite collections Xq < Xi < • ■ ■ < Xn- 

In the sequel, we consider this norm also for densities which are not necessarily 
continuous, and then it is natural to require that, for each the value p{x) lies in the 
closed segment A(a;) with endpoints p{x—) and p{x+). Note that if we change p{x) at 
a point of discontinuity such that p{x) goes out of A(a;), then the measure with density 
p is unchanged, while ||p||tv will increase. 

Thus, if the Fisher information I{X) is finite, the density p of X is a function of 
bounded variation, so the hmits 

p(— oo) = lim p{x), p(+oo) = lim p{x) 

x—^—oo x-^+oo 



8 S. G. Bobkov, G. P. Chistyakov and F. Gotze 

exist and are finite. But, since p is a density (hence integrable), these hmits must be 
zero. In addition, for any x, 

p{x)^ f p'{y)dy< f \p\y)\dy<^/l{X). 

We can summarize these elementary observations in the following: 

Proposition 2.2. If X has density p with finite Fisher information I{X), then 
p{—oo) — p{+oo) — 0, and the density has finite total variation satisfying 

/+00 
\p'{x)\dx<^/I{X). 
-oo 

In particular, p is bounded: maxa,p(a;) < I{X). 

Corollary 2.3. If X has finite Fisher information, then its characteristic function 
f{t) — Ee'*^ admits the bound 

l/WI<^0W, ieR- 

Indeed, using Proposition 2.2, one may integrate by parts, 

/+00 r+oo 
p(a;)(ie**^ = - / e'''' p'{x) dx, 
-oo J —oo 

which gives \t\ |Ee^*^| < J^^ \p'{x)\dx < ^/I{X). 

Another immediate consequence of Proposition 2.2 is that both p and p' are square in- 
tegrable, that is, they belong to the Sobolev space — (— oo, +00) of all absolutely 
continuous functions on the real line with finite Euclidean (Hilbert) norm 

/ + CXD p+OO 
u{xY dx + I u'{x)^dx. 
-00 J —00 

More precisely, 

p'ixfdx^ ^-^p(x)dx < maxp(x) ^-^dx<IiXfl\ (2.3) 

J-00 P{x) ^ J-00 P{x) 

Since the estimate on the total variation norm ||p||tv can be given in terms of the 
Fisher information, it is natural to ask whether or not it is possible to bound the total 
variation distance from p to a normal density in terms of the relative Fisher information. 
This suggests the following bound. 
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Proposition 2.4. IfX has mean zero, variance one, and density p with finite Fisher 
information, then 



||p-(^||tv<4V/(X||Z), (2.4) 
where Z has standard normal density (p. 



Proof. Using 

'p'(x) v'ix) 

p [X) - (fi' [X) — ■ 



^p{x) — X {p{x) — (p{x)) {p{x) > 0) 



.p{x) ip{x) 
and applying Cauchy's inequality, we may write 

/+00 
\p'{x) -(p'{x)\dx 
-oo 

/+00 
\x\\p{x)-ip{x)\dx. (2.5) 
-oo 

The last integral represents a weighted total variation distance between the distributions 
of X and Z with weight function w{x) — \x\. 

On this step we apply the following extention of Csiszar-KuUback-Pinsker's inequality 
(CKP) to the scheme of weighted total variation distances, which is proposed by Bolley 
and Villani, cf. [B-V], Theorem 2.1 (ii). li X and Y are random variables with densities 
p and q, and w{x) > is a measurable function, then 

( r w(x)\p(x)-q(x)\dxy < CD(X\\Y) ^ C T p{x) Xog^^dx, 

where 

-oo 



C = 2 M + log / e'"^'=^\{x)dx]. 

J — oo 



The inequality also holds in the setting of abstract measurable spaces, and when w = 1 
it yields the classical CKP inequality with an additional factor 2. 

In our case, Y — Z, q — (p, and taking w{x) — yjtjl \x\ (0 < i < 1), we get 

-(y \x\\p{x)-^{x)\dx) < (2 + \og^^D{X\\Z). 

One may choose, for example, t = 1 — ^, and recalling (1.1), we arrive at 

/+00 O 1 

\x\\p{x)-if{x)\dx < 3.1D{X\\Zy/^ <^I{X\\ZY/^. 
■oo v2 

It remains to use this bound in (2.5), and (2.4) follows. 
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3. Fisher information as a functional 

It is worthwile to discuss separately a few general properties of the Fisher information 
viewed as a functional on the space of densities. We start with topological properties. 

Proposition 3.1. Let {Xn)n>i be a sequence of random variables, andX he a random 
variable such that Xn =^ X weakly in distribution. Then 

7(X) < liminf (3.1) 



Denote by the collection of all (probabihty) densities on the real line with finite 
Fisher information, and let denote the subset of all densities which have Fisher 

information of at most size / > 0. On the set the relation (3.1) may be written as 

7(p)<liminf 7(p„), (3.2) 

n— >-oo 

which holds under the condition that the corresponding distributions are convergent 
weakly, i.e., 

/a pa 
Pn{x)dx — I p{x)dx, for all a e R. (3-3) 
-oo J — oo 

Hence, every ^i(/) is closed in the weak topology. In fact, inside such sets (3.3) can be 
strengthened to the convergence in the L^-metric, 

/— oo 
\Pn{x) dx — p{x)\ dx — 0. (3.4) 
-oo 



Proposition 3.2. On every set^i{I) the weak topology with convergence (3.3) and 
and the topology generated by the L^-norm coincide, and the Fisher information is a 
lower semi- continuous functional on this set. 

Proof. For the proof of Proposition 3.1, one may assume that I{Xn) — > /, for some 
(finite) constant /. Then, for sufficiently large n, the Xn have absolutely continuous 
densities Pn with Fisher information at most / + 1. By Proposition 2.2, such densities 
are uniformly bounded and have uniformly bounded variations. Hence, by the second 
Helly theorem (cf. e.g. [K-F]), there are a subsequence and a function p of bounded 
variation, such that Puki^) ~^ Pix), as A; — >■ oo, for all points x. Necessarily, p{x) > 
and p{x) dx < 1. Since the sequence of distributions of Xn is tight (or weakly pre- 

compact), it also follows that p{x) dx = 1. Hence, X has an absolutely continuous 
distribution with p as its density, and the weak convergence (3.3) holds. 

For the proof of Proposition 3.2, a similar argument should be applied to an arbitrary 
prescribed subsequence Pn^, where we obtain p{x) — lim^^oop^^ {x) for some further 
subsequence. By Scheffe's lemma, this property implies the convergence in L^-norm, 
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that is, (3.4) holds along n^j. This implies the convergence in for the whole sequence 
Pn, which is the assertion of Proposition 3.2. 

To continue the proof of Proposition 3.1. for simplicity of notations, assume that the 
subsequence constructed in the first step is actually the whole sequence. By (2.3), 

f +0O 

p'^{xfdx<{I+lf', 



I 

J — r 



which implies that the derivatives are uniformly integrablc on every finite interval. By 
the Dunford-Pettis compactness criterion for the space (over finite measures), there 
is a subsequence p'^^ which is convergent to some locally integrable function u in the 
sense that 

(3.5) 

J A * J A 

for any bounded Borel set A C R. (This is the weak (j{L^,L°°) convergence on finite 
intervals.) Note that, according to Proposition 2.1, p'^^ may be replaced in (3.5) with 
the sequence p'rn}-{pn^>o}-i which is thus convergent to u as well. 
Taking a finite interval A = (a, b) in (3.5), we get 



b 

u{x) dx — p{h) — p{a), 



which means that p is (locally) absolutely continuous. Furthermore, since 

/+00 
\u{x)\dx 
■oo 

is finite, wc conclude that u G L^(R), thus representing a Radon- Nikodym derivative: 
u{x) = p\x). Again, for simplicity of notations, assume the subsequence of derivatives 
obtained is actually the whole sequence. 
Next, consider the sequence of functions 

^n[x) = — i{p„(x)>0}- 

They have L^(R)-norm bounded by \/I + 1 (for large n). Since the unit ball of is 
weakly compact, there is a subsequence which is weakly convergent to some function 
^ e L^, that is, 

/+0O P+OO 
ink{x) q{x) dx ^ / ^{x)q{x)dx, 
-oo J —oo 

for any q & As a consequence, 

/ + 00 I I-+CO 

^nk{x) yPukix) q{x) dx^ I ^{x) ^/p{x)q{x) dx, 
■oo J —oo 
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due to the uniform boundedness and pointwise convergence oi Pn- In other words, again 
omitting sub-indices, the functions p'^ l{p„>o} are weakly convergent in to the function 
^y/p. In particular, for q — 1a with an arbitrary bounded Borel set A C R, 



J A J A 



As a result, we have obtained two limits for l{p„>o}, which must coincide, i.e., we 
get ^^yp — u — p' a.e. Hence, p = ^ p' = and C = on the set {p{x) > 0}. 

Finally, the weak convergence ^ L"^, in any Banach space, yields 

Hp) = U-Mp>o}\\h < mh < liminf WUWh = liminf /(p„J = /. 

k—^oo n— >oo 

Thus, Proposition 3.1 is proved. 

Another general property of the Fisher information is its convexity, that is, we have 
the inequality 

n 

I{p)<J2aiI{pi), (3.6) 

i=l 

where p = Yl^=i ^iPi with arbitrary densities pi and weights aj > 0, Yl^=i "^i = 1- This 
readily follows from the fact that the homogeneous function R{u,v) — u^/v is convex 
on the upper half-plane u & Ti, v > 0. Moreover, Cohen [C] showed that the inequality 
(3.6) is strict. 

As a consequence, the collection ^i(/) of all densities on the real line with Fisher 
information < / represents a convex closed set in the space — L^(R-) (for strong or 
weak topologies) . 

We need to extend Jensen's inequality (3.6) to arbitrary "continuous" convex mix- 
tures of densities. In order to formulate this more precisely, recall the definition of 
mixtures. Denote by ^ the collection of all densities, which represents a closed sub- 
set of with the weak a{L^, L°°) topology. For any Borel set ^4 C R, the functionals 
q ^ q[x) dx are bounded and continuous on So, given a Borel probability measure 
TT on one may introduce the probability measure on the real line 



li(A) = / / q(x) dx 
Ua 



dniq). (3.7) 



It is absolutely continuous with respect to Lebesgue measure and has some density 

dx 



p{x) = called the (convex) mixture of densities with mixing measure tt. For short. 



p{x) = / q{x)dn{q). 



Proposition 3.3. Ifp is a convex mixture of densities with mixing measure tt, then 

I{p)< [ I{q)dn{q). (3.8) 
7<p 
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Proof. Note that the integral in (3.8) makes sense, since the functional q I{q) is 
lower semi-continuous and hence Borel measurable on ^ (Proposition 3.1). We may as- 
sume that this integral is finite, so that tt is supported on the convex (Borel measurable) 
set qii = U,^i(/). 

Identifying densities with corresponding probability measures (having these densi- 
ties) , we consider *Pi as a subset of the locally convex space E of all finite measures // 
on the real line endowed with the weak topology. 

Step 1. Suppose that the measure tt is supported on some convex compact set K 
contained in ^i(/). Since the functional q I{q) is finite, convex and lower semi- 
continuous on K, it admits the representation 

I{q) = sup l{q), qe K, 
i&z. 

where £ denotes the family of all continuous affine functionals / on E such that l{q) < 
I{q), for all g e X (cf. e.g. Meyer [M], Chapter XI, Theorem T7). In our particular 
case, any such functional acts on probability measures as /(//) = i^i^) djjt,{x) with 
some bounded continuous function -0 on the real line. Hence, 

+ 00 



/ + 00 
q{x)ip{x) dx, 
■oo 



for some family £ of bounded continuous functions on R. An explicit description of 
£ would be of interest, but this question will not be pursued here. As a consequence, 
by the definition (3.7) for the measure // with density p. 



r /" r r~^°° 

/ I{q) d7r{q) > sup / / q{x)ip{x) dx 



diT^q) 



/ + 00 
p{x)ip{x)dx = /(p), 
■oo 

which is the desired inequality (3.8). 

Step 2. Suppose that tt is supported on ^i(/), for some / > 0. Since any finite 

measure on E is Radon, and since the set *^Ji(/) is closed and convex, there is an 
increasing sequence of compact subsets Kn C *Pi(/) such that 7r(U„A'„) = 1. Moreover, 
Kn can be chosen to be convex (since the closure of the convex hull will be compact, as 
well). Let 7r„ denote the normalized restriction of tt to (with sufficiently large n so 
that c„ = 7r(ir„) > 0) and define its baricenter 



I q{x)dnn{q). (3.9) 

JKr,. 



Prom (3.7) it follows that the measures with densities Pn are weakly convergent to the 
measure /i with density p, hence the relation (3.2) holds: I{p) < liminf„_^oo I{Pri)- 
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the other hand, by the previous step, 

I{Pn) < [ I{q)dnr,{q)^- I I{q)dn{q)^[ I{q)dn{q), (3.10) 
which yields (3.8). 

Step 3. In the general case, we may apply Step 2 to the normalized restrictions 

Tin of 71 to the sets Kn = ^i{n). Again, for the densities p„ defined as in (3.9), we 
obtain (3.10), where ^i(/) should be replaced with Another application of the 
lower semi- continuity of the Fisher information finishes the proof. 



4. Convolution of three densities of bounded vsiriation 



Although densities with finite Fisher information must be functions of bounded varia- 
tion, the converse is not always true. Nevertheless, starting from a density of bounded 
variation and taking several convolutions with itself, the resulting density will have finite 
Fisher information. Our nearest aim is to prove: 



Proposition 4.1. If independent random variables Xi,X2, Xs have densities pi,p2,P3 
with finite total variation, then S — Xi + X2 + X^ has finite Fisher information, and 
moreover. 



I{S) < ^ IIpiIItv lb2||TV + IIpiIItv IballTV + |b2||TV IballTV 



(4.1) 



One may further extend (4.1) to sums of more than 3 independent summands, but 
this will not be needed for our purposes (since the Fisher information may only decrease 
when adding an independent summand.) 

In the i.i.d. case the above estimate can be simplified. By a direct application of the 
inverse Fourier formula, the right-hand side of (4.1) may be related furthermore to the 
characteristic functions of Xj. We will return to this in the next section. 

First let us look at the particular case where Xj are uniformly distributed over 
intervals. This important example already shows that the Fisher information /(X1+X2) 
does not need to be finite, while it is finite for 3 summands. (This somewhat curioTis fact 
was pointed out to one of the authors by K. Ball.) In fact, there is a simple quantitative 
bound. 



Lemma 4.2. If independent random variables Xi, X2, X^ are uniformly distributed 
on intervals of lengths ai, a2, 03, then 



I{Xi + X2 + X3) <2 



111 

+ h 



0102 aiaa 0203 



(4.2) 



Fisher Information 15 

The density of the sum S = X1+X2+X3 may easily be evaluated and leads to a rather 
routine problem of estimation of I{S) as a function of the parameters Uj. Alternatively, 
there is an elegant approach based on general properties of so-called convex or hyperbolic 
distributions and the fact that the density p oi S behaves like the beta density near the 
end points of the supporting interval. 

To describe the argument, let us recall a few definitions and results concerning such 
measures. A probability measure fi on R^^ is called K-concave with a (convexity) param- 
eter < K < 1, if it satisfies a Brunn-Minkowski-type inequality 

nit A + (1 - t)B) > {tniA)'' + (1 - t)ii{BYY''' 

in the class of all non-empty Borel sets A,Bc R'^, and for arbitrary < t < 1. We 
refer to the papers by Borell [Borl-2] for basic properties of such measures, cf. also [Bo] 
(in fact, the values k < are also allowed, but will not be needed here). 

If yU. is absolutely continuous, the definition reduces to the property that ^ is supported 
on some open convex set Vt C R'' (necessarily bounded), where it has a positive density 
p such that the function p'^IO--i^<i) jg concave on Q. (Borell's characterization theorem). 
For example, the normalized Lebesgue measure on any convex body is ^-concave. In 
dimension one, n has to be supported on some finite interval (xo,xi), and Borell's 
description may also be given in terms of the function 

L{t)^p{F-\t)), 0<i<l, 

where : (0,1) — )■ (xo,xi) denotes the inverse of the distribution function F{x) — 
fj,{xo,x), restricted to the supporting interval. Namely (cf. [Bo]), a probability measure 
fj, is K-concave, if and only if the function L^/'^^"'*^ is concave on (0, 1). 

We only need the following well-known fact about the convexity parameter of convo- 
lutions which we formulate in case of three measures: If /ij are -concave (j = 1, 2, 3), 
then the measure // = //i * //2 * A*3 is K-concave, where 

1 _ 1 1 1 

K Ki K2 K,3 

Note also that the Fisher information of a random variable X with density p is 
expressed in terms of the associated function L as 

I{X)= C L\tfdt. (4.4) 
Jo 

This general formula holds whenever p is absolutely continuous and positive on the 
supporting interval (without any K-concavity assumption). 

Proof of Lemma 4.2. For definiteness, let Xj take values in [0,aj]. Since the 
distributions of Xj are 1-concave, the distribution ol S — Xi -\- X2 + X^^ is ^-concave, 
according to (4.3). This means that S has density p such that p^/^ is concave on the 



(4.3) 
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supporting interval [0, Oi + 02 + as], or equivalently, L^/^ is concave on (0, 1), where L 
is the associated function for S. 

Note that S has an absolutely continuous density p, which is thus vanishing at the 
end points x = and a: = ai + 02 + ^s- Hence, L[Q+) = L{1—) = 0. By the concavity, 
the Radon-Nikodym derivative (L^/^)' = |L^/^L' is non-increasing, and since L is 
symmetric about the point |, we get, for all < t < 1, 

L'{tfL{t) < c, where c = lim L'{tf L{t). 

Hence, by (4.4), 

/(^) < / j^dt = c{ai + a2 + a3). (4.5) 

Jo Ht) 

It remains to find the constant c. Putting a — 010203, it should be clear that, for all 
X > and t > small enough, 

F(x)^P{S<x}^^, P(x) = ^, F-\t) ^ {6aty/\ L(i) = ^ (6a^)^/^ 

and finally c = L'{tfL{t) = |. Thus, in (4.5) we arrive at I{X) < | (oi + 02 + 03) 
which is exactly (4.2). 

Lemma 4.2 allows us to reduce Proposition 4.1 to the case of uniform distrubutions. 
Note that if a density p is written as a convex mixture 

p{x) — / q{x)d7r{q), (4.6) 

then by the convexity of the total variation norm, 

|tv < / ||g||Tvc?7r(g). (4.7) 



Recall that we understand (4.6) as the equality (3.7) of the corresponding measures. 
So, (4.7) is also uses our original agreement that, for each x, the value p{x) lies in the 
closed segment with endpoints p{x—) and p{x+). 

In order to apply Lemma 4.2 together with Jensen's inequality for Fisher information, 
we need however to require that tt has to be supported on uniform densities (that is, 
densities of normahzed Lebesgue measures on finite intervals) and secondly to reverse 
(4.7). Indeed this turns out to be possible, which may be a rather interesting observation. 

Lemma 4.3. Any density p of bounded variation can be represented as a convex 
mixture (4.6) of uniform densities with a mixing measure n such that 



= / \\q\\Tvdn{q). (4.8) 



TV 
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For example, Hp is supported and non-increasing on (0,+cxd), there is a canonical 
representation 



with a unique mixing probability measure tt on (0, +00). In this case ||p||tv = 2p(0+), 
and (4.8) is obvious. One may write a similar representation for densities of unimodal 
distributions. In general, another way to write (4.6) and (4.8) is 



where tt is a Borel probability measure on the half-plane Xi > Xq (i.e., above the main 
diagonal). 

Let us also note that the sets BV(c) of all densities p with ||p||tv < c are closed under 
the weak convergence (3.3) of the corresponding probability distributions. Moreover, 
the weak convergence in BV(c) coincides with convergence in L^-norm, which can be 
proved using the same arguments as in the proof of Proposition 3.2. In particular, the 
functional q — > llgllxv is lower semi-continuous and hence Borel measurable on so 
the integrals (4. 7)- (4. 8) make sense. 

Denote by U the collection of all uniform densities which thus may be identified with 
the half-plane IJ — {{a,b) G : 6 > a} via the map (a, 6) — > Qa^bi^) = 6Z^l{a<x<6}- 
The usual convergence on U in the Euclidean metric coincides with the weak convergence 
(3.3) of Qa^b- The closure of U for the weak topology contains U and all delta-measures, 
hence U is a Borel measurable subset of 

Proof. We only need the existence part which is proved below in two steps. 

Step 1. First consider the discrete case, where p is piecewise constant, i.e., it is 
supported and constant on consecutive semiopen intervals = [xk-i, Xk), k — 1, . . . ,n, 
where xq < ... < Xn- Putting p{x) — Ck on Ak, we then have 



In this case the existence of the representation (4.6), moreover - with a discrete 
mixing measure vr, satisfying (4.8), can be proved by induction on n. If n = 1 or n = 2, 
then p is monotone on Ai, respectively, on Ai U A2, and the statement is obvious. 

If n > 3, one should distinguish between several cases. If Ci = or c„ = 0, we are 
reduced to the smaller number of supporting intervals. If = for some 1 < k < n, one 
can write p = f + g with f{x) = p{x) l^x<xk-i}j fl'(^) = P{^) '^{x>xk}- These functions are 
supported on disjoint half-axes, so ||p||tv = ||/||tv + IIs'IItv- Moreover, the induction 
hypothesis may be applied to both / and g (or one can first normalize these functions 
to work with densities, but this is less convenient). As a result. 




a.e. 




IIpIItv = Ci |c2 - Ci| H h |c„ - C„_i| + c, 



f^fl + --- + fk, 



g^ gi^ \-gi a.e. 
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where each fi is supported and constant on some interval inside [xo,Xk-i), each gj is 
supported and constant on some interval inside [xk,Xn), and 

II/IItv = II/iIItv H \- ||/fc||TV, IIs'IItv = Ibilkv H \- \\9i\\ty- 

Hence, 

P = ^fi + J29j with ||/||tv = Xlll•/'illTV + Xlll^J■||TV• 
^ J i j 

Finally, assume that Cfe > for all k < n. Putting c* = min^Cfc, write p — f + g, 
where f — Mxo,x„) and g thus takes the values Ck — c* on A^. Clearly, 

|tv = 2c* + IIs'IItv = ||/||tv + IbllTv- 



By the definition, g takes the value zero on one of the intervals (where = c*), so we are 
reduced to the previous step. On that step, we obtained a representation g — gi + - ■ ■+gi 
such that 1 1 (711 TV = ||^?l||TV + ■ ■ ■ + IIs'zIItv, where each gj is supported and constant on 
some interval inside [xo,Xn)- Hence, 

P^f + ^9j with IIpIItv = ||/||tv + X] II^jIItv- 
j j 

Although the measure tt has not been constructed constructively, one may notice 
that it should be supported on the densities of the form 

, , 1 

qij{x) = — — - l{xi<x<xj}, 0<i<j<n. 

Step 2. In the general case, one may assume that p is right-continuous. Consider the 
collection of piecewise constant densities of the form 

n 

p{x) = d l{xk-i<x<x,} (4.9) 

k=l 

with arbitrary points xq < ... < x^ of continuity of p such that p{xk-i) > for at least 
one k, and where is a normalizing constant so that f^^p{x)dx = 1. Since p has 
bounded total variation, it is possible to construct a sequence Pn of the form (4.9) which 
is convergent to p in L^-norm and with d = d„ — > 1. By the construction, 

^ n—l 

^ IbnllxV ^ p{xo) +p{Xn-l) +^\p{Xk) - p{Xk-l)\ < \\p\\tV, (4.10) 

k=l 

SO all Pn belong to BV(c) with some constant c. 

Using the previous step, one can define discrete probability measures 7r„ supported 
on U and such that 

Pn{x)^ / q{x)d'!rn{q), |bn||TV = / MWydTTniq). (4.11) 
Ju Ju 
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/[/ 



Since U has been identified witli the half-plane t/, replacing diiniq) with d7in{(i, b) should 
not lead to confusion. In particular, the second equality in (4.11) may be written as 

lbn||TV = 2 / -^d7Tn{a,b). (4.12) 

Job- a 

Prom the first equality in (4.11) it follows that, for any T > 0, 

q{x)dx dnn{q) = / Pn{x) < / p{x) dx + \\pn - p\\i. 

'\x\>T -'|a;|>r J\x\>T 

Hence, by Chebyshev's inequality, for any Sk > 0, 

TTnjgeC/: / q{x)dx>ek\ <—( [ p{x) dx + \\pn - p\\i) ■ (4.13) 

J\x\>k ^ ^J\x\>k ' 

Clearly, one can choose a sequence £fe i and an increasing sequence of indices such 
that the right-hand side of (4.13) will tend to zero, as A; — > oo, uniformly over all n >nk- 
In particular, the above inequality holds for 7r„^. 

On the other hand (identifying q with corresponding probability distributions), by 
the Prokhorov compactness criterion, the collection of densities 

|ge<P: / q{x)dx<ek} 

J\x\>k 

is prc-compact for the weak topology with convergence (3.3), cf. e.g. [Bi]. Therefore, 
by the same criterion applied to ^ as a Polish space, 7r„ contains a weakly convergent 
subsequence TTn^. with some limit tt e This measure is supported on the (weak) closure 
of U, which is a larger set, since it contains delta-measures, or the main diagonal in R^, 
if we identify U with U. However, using (4.12) together with Chebyshev's inequality, 
and then applying (4.10), we see that, for any s > and all n > hq, 



7r„{(a, b) :b-a<e} = 7r„|(a, b) : > -| < ^ l|Pn||TV < 



TV- 



Hence, tt is actually supported on U. Moreover, taking the limit along Uk in the first 
equality in (4.11), we obtain the representation (4.6). 

Now, the sets G{t) = {q E U : H^Htv > t} are open in the weak topology (by the lower 
semicontinuity of the total variation norm), hence, liminffe^oo 7r„;.(G(t)) > 7r(G(t)). 
Applying Fatou's lemma and then again (4.10) and the second equality in (4.11), we get 

r /'+0O n-\-oo 

I \\qhydT:{q) = / n{G{t))dt < liminf / Trn^{G{t))dt 
Ju Jo Jo 



liminf / ||g||TvO?7r„j^(g) = liminf \\p. 



wIItv < IIpIItv- 

In view of Jensen's inequality (4.7), we obtain (4.8) thus proving the existence part of 
the lemma. 
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Proof of Proposition 4.1. We may write down the representation (4.6) from 
Lemma 4.2 for each of the densities pj {j — 1, 2, 3). That is, 

Pj{x) = Jq{x)dnj{q) a.e. 
with some mixing probabihty measures iTj, supported on U and satisfying 

llPilkv = J Wqhvdnjiq). (4.14) 
Taking the convolution, we then have a similar representation 

{pi*P2*P3){x) = I I I {qi * qi * q3){x) dTii{qi)dTi2{q2)dTiz{q^) a.e. 



One can now use Jensen's inequality (3.8) for the Fisher information and apply (4.2) to 
bound I{pi * P2* Ps) from above by 

^ // / ^"^^""^^ lk2||TV + Ikilkv llgallTV + ||g2||TV llfellTv] dni{qi)d'K2{q2)d'K3{q3) . 

In view of (4.14), the triple integral coincides with the right-hand of (4.1). 
Proposition 4.1 is proved. 



5. Bounds in terms of characteristic functions 

In view of Proposition 4.1, let us describe how to bound the total variation norm of a 
given density p of a random variable X in terms of the characteristic function f{t) = 
Ee**^. There are many different bounds depending on the integrability properties of / 
and its derivatives, which may also depend on assumptions on the finiteness of moments 
of X. We shall present two of them here. 

Recall that, if p is absolutely continuous, then 

f + OD 



TV 



|p (a;)| dx. 



f 



Proposition 5.1. If X has finite second moment and 

f+OO 

\t\{\m\ + \nt)\ + \nt)\)dt<+oc, (s.i) 

-oo 

then X has a continuously differentiable density p with finite total variation 

-I r+oo 

ITV < 2 y ^ {\tnt)\+2\f'it)\ + \tm\)dt. (5.2) 



Proof. The argument is standard, and we recall it here for completeness. 
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First, by the moment assumption, / is twice continuously differentiable. Tiie as- 
sumption (5.1) implies that X has a continuously differentiable density 

1 r+oo 

Pi^)-^ J e-^'^f{t)dt (5.3) 

with derivative 

j f+oo 

p'(^) = -^y e-''Hf{t)dt. (5.4) 

Necessarily f{t) — )■ 0, as \t\ — )■ +oo, and the same is true for f'{t) and f"{t). There- 
fore, one may integrate in (5.3) by parts to get, for all x e R, 

xp{x)^-—j e-^'^f{t)dt (5.5) 

and 

■" + CXD 



1 p + co 



By (5.1), we are allowed to differentiate the last equality by performing differentiation 
under the integral sign, which together with (5.4) and (5.5) gives 

j r+oo 

(1 + x')p'{x) = - y e-*- {tfit) + 2f{t) - tm) dt. 

Hence, |p'(a:)| < 27r(i+j:^) ^''^^^ ^ constant described as the integral in (5.2). After 
integration of this pointwisc bound, the proposition follows. 

One can get rid of the assumption of existing second derivative in the bound above 
and remove any moment assumption in Proposition 5.1. But we still need to insist on 
the corresponding integrability requirements for the characteristic function including its 
differentiability on the positive half-axis. 

Proposition 5.2. Assume the characteristic function f{t) of a random variable X 
has a continuous derivative for t > 0, with 

/+0O 
tmf{t)\' + \f{t)\')dt<+oc. (5.6) 
-oo 

Then X has an absolutely continuous distribution with density p of bounded total vari- 
ation such that 

/ \tf{t)\'dt / m{t))fdt) . (5.7) 

J —oo J —oo J 
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Proof. First assume additionally that / and /' decay at infinity sufiiciently fast 
(so that tf{t) — >■ 0, as \t\ — >■ +00). Integrating by parts in (5.4) and since {tf{t)y is 
integrable near zero, we get a similar representation 

"+00 



1 r+oo 



As usual, write |p'(a;)| = \\+ix\ K-*- + ix)p{x)\ and use Cauchy's inequality together with 
Plancherel's formula, to get 



+00 



+00 



[\tf{t)\'+\itf{t))r] dt. 



Applying the same inequality to AX and optimizing over A > 0, we arrive at (5.7). 

In the general case, one may apply (5.7) to the regularized random variables X„ — 
X + aZ with small parameters cr > 0, where Z ~ N{0, 1) is independent of X. They 
have smooth densities Pa and characteristic functions fa{t) — f{t) e~'^ * Repeating 
the previous argument for the difference of densities, we obtain an analogue of (5.7), 

/+OO P+OO 
\t{faAt)-faAt))?dt / \{t{U{t)-U.mfdt (5.8) 
-00 J —00 

with arbitrary o"i,o"2 > 0. Since the integrals in (5.7) arc finite, by the Lebesgue 
dominated convergence theorem, the right-hand side of (5.8) tends to zero, as long 
as (Ji, (72 0. Hence, the family {p^} is fundamental (Cauchy) for cr ^ in the Banach 
space of all functions of bounded variation on the real line that are vanishing at infinity. 
As a result, there exists the limit p = limcr-s.oPcr in this space in total variation norm. 

Necessarily, p{x) > for all x, and j^^p{x)dx = 1. Hence, X has an absolutely 
continuous distribution with density p. In addition, by (5.7) applied to Pa, 

(/•+00 /•+00 \ 1/4 

/ \tut)\^dt \{tut)y\^dt] . 
j-00 j-00 ) 

The last limit exists and coincides with the right-hand side of (5.7). 

Corollary 5.3. J/ the independent random variables Xi,X2,X3 have finite first 
absolute moment and a common characteristic function f{t), then 

o / r+oo r+oo \ 1/2 

i{x^+x,+x,) < \tf{t)\'dt J \{tm)fdtj . 

If Xi has finite second moment, we also have 

I{X, + X, + X,) < ^(^jy {\tf"{t)\ + 2\f{t)\ + \tf{t)\)dty. 
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6. Classes of densities representable as convolutions 

General bounds like those in Proposition 2.1 may considerably be sharpened in the case 
where p is representable as convolution of several densities with finite Fisher information. 

Definition 6.1. Given an integer k > 1 and a real number / > 0, denote by ^k{I) 
the collection of all functions p on the real line which can be represented as convolution 
of k probabihty densities with Fisher information at most /. 

Correspondingly, let denote the collection of all functions p representable as con- 
volution of k probability densities with finite Fisher information. 

The collection of all densities with finite Fisher information has been already 
discussed in connection with general properties of the functional /. For growing k, the 
classes ^k{I) decrease, since the Fisher information may only decrease when adding an 
independent summand. This also follows from the following general inequality of Stam 

' >.^^^. (6.1) 



/(X + Y)- I{X) I{Y) 

which holds for all independent random variables (cf. [St], [Bl], [J]). Moreover, it implies 
that p = Pi * ■ ■ ■ * Pk ^ ^fc(-^/^); as long as Pi e i = I, . . . ,k. 

Any function p in is — 1 times differentiable, and its {k — l)-th derivative is 
absolutely continuous and has a Radon-Nikodym derivative, denoted by p^''\ Let us 
illustrate this property in the important case k — 2. Write 

/ + 00 
Pi{x - y)p2{y) dx (6.2) 
■oo 

in terms of absolutely continuous densities pi and p2 of independent summands Xi and 
X2 of a random variable X with density p. Differentiating under the integral sign, we 
obtain a Radon-Nikodym derivative of the function p, 

/+00 r+oo 
P'i(x - y)p2(y) dy ^ / p[(y)p2(x - y) dy. (6.3) 
-00 J —00 

The latter expression shows that p' is absolutely continuous and has a Radon-Nikodym 
derivative 

/•+00 

p"[x)= p'i{y)p'2{x -y)dy, (6.4) 



which is well-defined for all x. In other words, p" appears as the convolution of the 
functions p[ and (which are integrable, according to Proposition 2.2). 

These formulas may be used to derive a number of elementary relations within the 
class and here we shall describe some of them for the cases ^2 and ^^3. 
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Proposition 6.2. Given a density p G '^2{I), for all x G R, 

\p'{x)\ < P'*^/p{x)<I. (6.5) 
Moreover, p' has finite total variation 



/ + 0O 
\p"{x)\dx<I. 
-oo 



The last bound immediately follows from (6.4) and Proposition 2.2. To obtain the 
pointwise bound on the derivative, we may appeal to Proposition 2.1 and rewrite the 
first equality in (6.3) as 

V\x)^ / / , , l{pi(c.-2/)>o} \lvAx - y)p2\y) dy. 
J-oo \/Pi[x-y) 

Using Cauchy's inequality, we get 



/ + 00 
Pi{x-y) 
-oo 



P2{yfdy 



< /(Xi) maxp2(2/) / pi{x-y)p2{y)dy < I{X^)I{X2y'' p{x), 

where we apphed Proposition 2.2 to the random variable X2 on the last step. This gives 
the first inequality in (6.5), while the second follows from p(x) < 
Now, we state similar bounds for the second derivative. 

Proposition 6.3. For any density p G ^2(-^); we have p{x) = =^ p"{x) — and 
\p"{x)\ < P/^, for all X. In addition, 



L 



{p(x)>0} 



p{x) 



Proof. Let us start with the representation (6.4) for a fixed value a; G R. Note that 
the function p'i{x — y) p^iy) appearing in this formula is continuous in y. By Proposition 
2.1, the integral in (6.4) may be restricted to the set {y : P2{y) > 0}. By the same 
reason, it may also be restricted to the set {y : pi{x — y) > 0}. Hence, 

/+00 
p'i{y)p'2{x -y)^A{y)dy, (6.6) 
-oo 

where {y : pi{x — y)p2{y) > 0}. On the other hand, by the definition (6.2), the assump- 
tion p{x) = implies that Pi{y)p2{x — y) = for almost all y. Therefore, lA{y) — a.e., 
and thus the integral (6.6) is vanishing, that is, p"{x) = 0. 

Using the representation (6.4), the bound |p"(a;)| < /^^^ follows from the uniform 
bound (6.5) on p' and the integral bound of Proposition 2.2. 
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Next, introduce the functions uAx) = ^ilJ— ^ipi(x)>o} (i = 1)2) and rewrite (6.4) as 

+ 00 



p"{x) = / {uii^ - y)u2{y)) \/pi{x- y)p2{y) dy. 



— oo 



By Cauchy's inequality, 

f+OO 



/+00 r+ca 
ui{x - y f U2{y f dy / pi{x - y)p2{y) dx = u{xfp{x), (6.7) 
■oo J — oo 

where we used u >0 given by 

r+oo 

u{xf = / ui{x -y fu2{y fdy. (6.8) 



oo 



Clearly, 

/+0O 
u{xf dx = I{Xi)I{X2) < /^ 
-oo 

which is the inequality of the proposition. 

Proposition 6.4. Given a density p e ^3(/), we have, for all x, 

\p"{x)\<l"'^). 



Indeed, by the assumption, one may write p = pi*p2 with pi G ^i(/) andp2 ^ ^2(-^)- 
Returning to (6.7)-(6.8) and applying Proposition 6.2 to p2, we get U2{y) < I^^^, so 



uix)"^ 



/ + 00 
u^{x-yfdy<l'''\ 
■oo 



7. Bounds under moment assumptions 

Another way to sharpen the bounds obtained in Section 2 for general densities with 
finite Fisher information is to invoke conditions on the absolute moments 

= ^,(X) = E (s>0 real). 

By Proposition 2.1 and Cauchy's inequality, if the Fisher information is finite. 



/+00 r 
\xY\p'{x)\dx = I \xYp{x) 
■oo -'{p(a;)>0} 



1/2 \P'{^)\ , 

p{xy/^ 



< (I \x\^'p{x)dxY^ ( f t^dx^'^ 

\j{p{x)>Q} J \J{p{x)>Qi} P{^) 



Hence, we arrive at: 
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Proposition 7.1. If X has an absolutely continuous density p, then, for any s > 0, 

f+OO 



/+0O 
\x\'\p'(x)\dx<y/p2sI{X). 
-oo 



This bound holds irrespectively of the Fisher information or the 2s-th absolute mo- 
ment /32s being finite or not. 

Below we describe several applications of this proposition. 

First, let us note that, when s > 1, the function u{x) = (1 + is (locally) 

absolutely continuous and has a Radon-Nikodym derivative satisfying 

\u'{x)\ < s\x\'~'^p{x) + (1 + \p'{x)\. 

Integrating this inequality and assuming that both I{X) and (32s are finite, we see that 
u is a. function of bounded variation. Since u is integrable as well, we have 

u{—oo) — lim u{x) — 0, it(-l-oo) — lim u{x) — 0. 

X—^ — OO X— > + oo 

Therefore, applying Propositions 2.2 and 7.1, we get 

f +0O 

/\ / //\7 ^ / 

U 



/X r+oo 
u'{y)dy < I \u'{y)\dy 
-oo J —oo 

/+0O f+OO 
\x\^~^ p{x) dx + {l + \x\^)\p'{x)\dx 
-OO J —oo 



< s 

< s/3s-i + ^/l{X) + ^//32sI{X). 
In addition, u{x) ^ 0, as x ^ oo. One can summarize. 

CoroUciry 7.2. If X has density p, then, given s > 1, for any x e R, 

C 



p{x) < 



1 + \x\ 



with a constant C — sPs-i + ^/(l + P2s)I(X). If this constant is finite, we also have 

lim (1 + \x\'')p{x) = 0. 



In the resulting inequality no requirements on the density are needed. 
Applying Proposition 7.1 and Corollary 7.2 (the last assertion) with s = 1, we obtain 
the following sharpening of Corollary 2.3. 

Corollary 7.3. If X has finite second moment and finite Fisher information I{X), 
then for its characteristic function f(t) ~ 'Ee^*-^ we have 
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with constant C = 1 + ^J~j32T{X). 

Indeed, if p is density of X and i 7^ 0, one may integrate by parts 

/+00 -j^ r-+co 

e**^ {ix) p{x) dx ^ - xp{x) de**^ 
00 ^ J —00 

f+00 



— — / {p{x) + xp'{x))e''^^ dx, 
^ J -00 



which yields \tf'{x)\ < 1 + y/^^T(X). 

Under stronger moment assumptions, one can obtain better bounds in comparison 
with Corollary 7.2. For example, if for some A > 0, the exponential moment 

/+0O 
-oo 

is finite, then by similar arguments, for any x G R, we have p{x) < C e^^'^l with some 
constant C depending on A, (3 and I{X). 



8. Fisher information in terms of the second derivative 

It will be convenient to work with the formula for the Fisher information involving the 
second derivative of the density. We state it for convolutions of two densities with finite 
Fisher information. 

Proposition 8.1. // a random variable X has density p e ^2, then 

/+00 
p"{x) log p{x)dx, (8.1) 
-00 

provided that 

/+00 
\p"(x) logp(x)| dx < +00. (8.2) 
'OO 

The latter condition holds, if E\X\^ < +00 for some s > 2. 

Strictly speaking, the integration in (8.1)- (8. 2) should be performed over the set 
{x : p{x) > 0}. One may extend this integration to the whole real line by using the 
convention OlogO = 0. This is consistent with the property that p"{x) = 0, as soon as 
p{x) = (according to Proposition 6.3). 

Proof. The assumption p E ^2 ensures that p has an absolutely continuous deriv- 
ative p' with Radon-Nikodym derivative p". By Proposition 6.2, p' has bounded total 
variation, which justifies the possibility of integration by parts. 
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More precisely, assuming that p G ^2, let us decompose the open set {x : p{x) > 0} 
into disjoint open intervals (a„,6„), bounded or not. In particular, p(a„) = p{bn) — 0, 
and by the bound (6.5) of Proposition 6.2, 

\p'{x)\ogp{x)\ < P^^\Jp{x) |logp(a;)| — >■ 0, as x J, a„, 
and similarly for 6„. Integrating by parts, we get for an < Ti < T2 < bn, 

rT2 ^U^\2 /-Ta 



/ / N dx = / p [x) dlogp{x) 

— / p"{x) \ogp{x) dx. 



^2 pT2 

p{x)\ogp{x) 



x=Ti JTi 



Letting Ti an and T2 ^ bn, we get 

/•bn 

dx = — p"{x) logp(a;) dx, 

J On. 



/ 

J a, 



6n p/(^)2 rhn 



p{x) 



where the second integral is understood in the improper sense. It remains to perform 
summation over n on the basis of (8.2), and then we obtain (8.1). 

To verify the integrability condition (8.2), one may apply an integral bound of Propo- 
sition 6.3. Namely, using Cauchy's inequahty, for the integral in (8.2) we have 

( / V^|logp(x)|dx)' < P r^p{x)\og'p{x)dx. 

If the moment /^^ = E \XY is finite. Corollary 7.2 yields 



log(e + 


N) 


1 + 


x\ 


s/2 



with constant C depending on / and Ps- The latter function is integrable in case s > 2, 
so the integral in (8.2) is finite. Proposition 8.1 is proved. 

Of course, for smooth positive p, (8.1) remains valid without additional assumptions. 
However, then the integral should be understood in the improper sense (it exists and is 
finite, as long as X has finite Fisher information). 

In order to involve the standard moment assumption - the finiteness of the second 
moment, we consider densities representable as convolutions of more than two densities 
with finite Fisher information. 

Proposition 8.2. // a random variable X has finite second moment and density 
p e ^5, then condition (8.2) holds, and X has Fisher information given by (8.1). 



To show that (8.2) is fulfilled, it suffices to prove the following pointwise bounds 
which are of independent interest. 
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Proposition 8.3. // EX^ < 1 andX has density p G *P5(/), then with some absolute 
constant C , for all x, 

\p"{x)\ < CI':^ (8.3) 

and 

\p"(x)logp{x)\ < C73M^±M. (8.4) 

Proof. The assumption EX^ < 1 implies I >1 (by Cramer- Rao's inequality). Also, 
the characteristic function f{t) — Ee**^ is twice differentiable, and by Corollary 2.3, it 
satisfies 

j5/2 
|/WI<^- 

Hence, p may be described as the inverse Fourier transform 

1 f+OO 

P{x) = - y e-*V(i) dt, 
and a similar representation is also valid for the second derivative, 

1 /• + 0O 

P"(^) = -^J e-^'^ef{t)dt. (8.5) 

Write X — Xi -\- ■ ■ ■ -\- Xr, with independent summands such that I{Xj) < I and 
assume (without loss of generality) that they have equal means. Then EX| < 1, hence 
the characteristic functions fj{t) of Xj have second derivatives \ fj{t)\ < 1. Moreover, 
by Corollaries 2.3 and 7.3, 

jl/2 I , jl/2 

Now, differentiation of the equahty f{t) — fi{t) . . . /^{t) leads to 

fit) = m hit) ...Mt)+---+Mt)... hit) m, 

hence |/'(t)| < "^^ ^^^^^ ' ^ ■ Diff'erentiating once more, it should be clear that 



\nt)\ < ^ + 20/3/2(1 + 71/2)2 



5 



t^ K 

These estimates imply that 

^7-5/2 rr^/^ 

\{t'fm<^. \{t'm)"\<^ {\t\>i) 



30 S. G. Bobkov, G. P. Chistyakov and F. Gotze 

with some absolute constant C. As a consequence, one may differentiate tlie equality 
(8.5) with X 7^ by parts to get 

Hence, for all a; e R, 



/(^)I<T— (8-6) 



l + x 

with some absolute constant C. 

Now, to derive the second pointwise bound, first we recall that p{x) < 1^1'^. Hence, 



jl/2 

I logp(a;)| < log(/V2) + log (8.7) 

where the last term is thus non-negative. Next, we partition the real line into the sets 
A = {x : p{x) < 2(1+1^) } complement B. On the set A, by Proposition 6.3, 

p{x) p[x) 1 + x^ 

and similarly, by (8.6), on the set B we have an analogous inequality 

\p"{x)\ log^ < \p"{x)\ log (2(1 + x^)) < c,/5/2 M^^+j^l) . 

Thus, for all x, applying (8.7) and again (8.6), 

jl/2 



\p"{x)\ogpix)\ < \p"{x)\\ogil'/') + \p"ix)\\og 
< C/^/2(i + iog/)i^ 

Proposition 8.3 is proved. 



p{x) 

log(e + \x\) 



+ x2 



9. Normalized sums. Proof of Theorem 1.3 

By the definition of classes (A; = 1, 2, . . . ), the normalized sum 

of independent random variables Xi , . . . , with finite Fisher information has density 
Pn belonging to ^jt, as long as n > A;. 
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Moreover, if all I{Xj) < I for all j, then pn G *^fc(2A;/). Indeed, one can partition 
the collection X-i, . . . , into k groups and write Z„ = C/i + -- - + C/j. with 



Ui^ {1 <i < k - 1), Uk^ —1= y~] Xj, 

^ ^ j=(fe-l)m+l 



where m — [|]. By Stam's inequality (6.1), for 1 < i < /c — 1 

^_ ^ 1 ^ 1 



> ^ > ^ 



I{Ui) - n ^ I{X^i_i)m+j) - nl - 2kr 

and similarly > 2^7- 

Therefore, the previous observations about densities from are applicable to 
with sufficiently large n, as soon as the Xj have finite Fisher information with a common 
bound on I{Xj). 

A similar application of (6.1) also yields I{Zn) < 2J(Z„y). Here, the factor 2 may 
actually be removed, as a consequence of one generalization of Stam's inequality obtained 
by Artstein, Ball, Barthe and Naor. It is formulated below as a separate proposition 
(although for our purposes the weaker inequality is sufficient). 

Proposition 9.1 [A-B-B-N2]. If {Xn)n>i o^^e independent and identically distributed, 
then 

I{Zn) < I{Zno), for all n > Uq. 

We are now ready to return to Theorem 1.3 and complete its proof. 

Proof of Theorem 1.3. Let {Xn)n>i have finite second moment and a common 
characteristic function fi. The characteristic function of Z„ is thus 

/„(i)=Ee^*^" = /i(^-^) . (9.1) 

Clearly, a) ^ b) <^ c). 

If Zn has density Pn of bounded total variation. Proposition 4.1 yields I{Zzn) — 
I{pzn) < 1 1 bn 1 1 TV < +00 ■ Hence we obtain c) a), as well, and thus, the conditions 
a) — c) are equivalent. 

a) ^ d). Assume that I{Zn^) < +00 for some fixed no > 1. Applying Corollary 2.3 
with X — Zno , it follows that 



\fno{t)\<jVnoI{Zno), t>Q. 

Hence, |/i(t)| < Ct"^ with constants £ — ^ and C = (no/(^no))^^^"° which is d). 
d) =^ e) is obvious. 
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e) =^ c). Differentiating the formula (9.1) and using the integrability assumption 
(1.8) on /i, we see that, for all n > + 2, the characteristic function /„ and its first 
two derivatives arc intcgrablc with weight \t\. This implies in particular that has a 



continuously differentiable density 

■"+00 

which, by Proposition 5.1, has finite total variation 

r-H-OO 



Pn{x) = ^ I e-''^Ut)dt, (9.2) 



/-\-oo 1 p+oo 

\p'n{x)\ dx<- [\tm\ + 2 \m\ + \tut)\) dt. 
■oo J —oo 



Thus, Theorem 1.3 is proved. 

Remark 9.2. If we assume in Theorem 1.3 finiteness of the first absolute moment of 
Xi (rather than the finiteness of the second moment), the statement will remain valid, 
provided that the integrability condition e) is replaced with a stronger condition like 

f + OO 

Ifiit)]" dt <+oo, for some i/>0. (9.3) 



/ 

J — ( 



In this case, it follows from (9.1) that, for all n > z/ + 1, the characteristic function /„ 
and its derivative are integrablc with weight t^. Therefore, according to Proposition 5.2, 
the normalized sum Z„ has density p„ with finite total variation 

(/•+00 /•+00 \ 1/4 

/ \tfn{t)\'dt / \{tfn{t)y\'dt 
J —oo J —oo > 

As a result, we obtain the chain of implications (9.3) =^ b) a) d). The latter 
condition ensures that pn admits the representation (9.2) and has a continuous derivative 
for sufficiently large n. That is, we obtain c). 



10. Edgeworth-type expansions 

In the sequel, let (X„)„>i be independent identically distributed random variables 
with mean EiXi — and variance Var(Xi) = 1. Here we collect some auxiliary results 
about Edgeworth-type expansions for the distribution functions Fn{x) — P{Zn < x} 
and the densities p„ of the normalized sums Z„ = (Xi + ■ ■ ■ + X„) / -y/n. 

If the absolute moment E iXil'* is finite for a given integer s > 2, define 

s-2 

ifsix) = ifiix) + J2 Qk{x) n-*^/' (10.1) 

k=l 
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with the functions described in the introductory section, i.e., 

= (I)".,, (^)". (10.2) 

Here, denotes the Chebyshev-Hermite polynomial of degree A; > with leading 
coefficient 1, and the summation runs over all non- negative solutions (ri, . . . , rk) to the 
equation ri + 2r2 + • • • + krk — k with j — ri + • — \-rk- 
Put also 

^s{x) = I ips{y)dy = $(a;) + J]gfc(a;)n-'=/2. (10.3) 

k=i 

Similarly to qk, the functions Qk have an explicit description involving the cumulants 
73,---,7fe+2 oiXi, namely, 

Q,ix) = -^ix) Y: H,^,,.,{x) ... ( , 

where the summation is the same as in (10.2), cf. [B-RR] or [P]. 

The functions ips and $s arc used to approximate the density and distribution func- 
tion of Zn with error of order smaller than n"^*"^-'/^. The following lemma is classical. 

Lemma 10.1. Assume that limsup|^|_^^(^ |/i(f) | < 1. //E|Xi|* < +00 (s > 3), 
then as n —>■ 00, uniformly over all x 

(1 + \x\%F^{x) - $[,](x)) = o(n-(-2)/^). (10.4) 



Let us emphasize that (10.4) remains valid for general real s > 2. Here, $s should be 
replaced with For the range 2 < s < 3 the Cramer condition for the characteristic 
function is not used, and the result was obtained in [0-P]; the case s > 3 is treated in 
[P] (cf. Theorem 2, Ch.Vl, p. 168). 

We also need to describe the approximation of densities. Recall that Z„ have the 
characteristic functions 

where fi stands for the characteristic function of Xi. If the Fisher information /(.^no) 
is finite, then, by Corollary 2.3, |/no(^)l ^ ^ with some constant (namely, (? = /(Z„,))). 
Hence, given m > 1, the characteristic functions of Z„ admit a polynomial bound 
|/n(^)| ^ Cm 1^1""* for n > mrio and with which does not depend on t. Thus, for all 
sufficiently large n, have continuous bounded densities 

Vn{x) - — y e-*Vn(i) dt, 
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which have continuous derivatives 

pII\x) = ^ y ^ e-''Vn{t) dt (10.5) 

of any prescribed order. 

Lemma 10.2. Assume I{Zn„) < +oo, for some no, and let E \Xi\'^ < +oo (s > 2). 
Fix I = 0,1, . . . Then, for all sufficiently large n, 

(1 + \x\^) |pW(x) - ^^\x)\ < ^Pi,^{x) xeR, (10.6) 

where — >■ 0, as n ^ oo, and 

/ + 00 
^l^i,n{^fdx<l. (10.7) 
■oo 

In case I — 0, this lemma with the first bound sup^ |V^i „(a;)| < 1 is a well-known 
result, which does not need to require the finiteness of Fisher information, while using 
the assumption of the boundedness of Pn for large n, only. We can refer to [P], p. 211 
in case s > 3 and to [P], pp. 198-201 for the case s = 2 when ips = (f. The result 
follows from the corresponding Edgeworth-type approximation of /„(i) by the Fourier 
transforms of <fs{x) on growing intervals such as \t\ < C\n}-I^ in case s > 3. Repeating 
the arguments on pp. 211-212 of [P] and applying Plancherel's formula, one can easily 
obtain the second bound in (10.7), as well. In fact, the case / > 1 is similar, since the 
appearence of the additional factor (—it)' in (10.5) does not create any difficulty due to 
the polynomial decay at infinity of the characteristic functions /„. 

For the proof of Theorem 1.1, the lemma will be used with the values 1 — 0,1, 2, only. 

11. Behaviour of densities not far from the origin 

To study the asymptotic behavior of the Fisher information distance 

,(Z„I|Z)= r~(?!k(£)±£P^,,, 

wc split the domain of integration into the interval \x\ < and its complement. Thus, 
define 

Jo = /" ^P'n{x) + ^Pn{x)f 

° J\x\<Tn Pn{x) 

and similarly Ji for the region |x| > r„. If r„ is not too large, the first integral can be 
treated with the help of Lemma 10.2. Namely, we take 

Tn = V(s - 2) log n + s log log n + p„ (s > 2), (11.1) 
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where p„ — )■ +00 is a sufficiently slowly growing sequence whose growtli is restricted 
by the decay of the sequence in (10.6). In other words, [— T^jT^] represents an 
asymptotically largest interval, where we can guarantee that the densities p„ of are 
separated from zero, and moreover, sup|^|<2;^ — 1| — )■ 0. To cover the case s = 2, 

one may put T„ = ^^p^, where T„ — > +00 is a sufficiently slowly growing sequence. 
With this choice of T„, an estimation of the integral J\ can be performed via moderate 
inequalities. 

In this section we focus on Jq and provide an asymptotic expansion for it with a 
remainder term which turns out to be slightly better in comparison with the resulting 
expansion (1.3) of Theorem 1.1. 

Lemma 11.1. Let s > 3 be an integer. If I{Zno) < +00, for some Uq, then 

T + + ^[(^-^)/^] + J I 

° n n[(--2)/2] ^ " \ ni^-^ (log n) ) ' 

where the coefficients Cj are defined in (1.4). 

Proof. Let us adopt the convention to write 5n for any sequence of functions satis- 
fying |5„(a;)| < £„n^(*^^)/^ with e„ — )> 0, as n — )■ 00, at least on the intervals < T„. 
For example, the statement of Lemma 10.2 with / = may be written as 

p^{x) = (1 + Us{x)Mx) + — f^, (11-2) 

1 + \xr 



where 

s-2 

in.t T) — ml T) 

Us{x) 



(Ps{x) - (p{x) _ ^ qk{x) 1 



Combining the lemma with / = and / = 1, we obtain another representation 

p'^(x) + XPn(x) = Ws(x) + " (11.3) 

1 + \x\^ 

where 

_ q',{x) + xqkjx) 
Ws[x) - 

k=l 

Note that the functions Ug and Wg depend on n as parameter and are getting small 
for growing n. More precisely, it follows from the definition of Qk that, for all x e R, 

\wg{x)\ l + \xf^ , ,1 ^ ^ l + |x|3(^-^) 

< Cg and Ps(2^)| < Cg (11-4) 
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with some constants depending on s and the cumulants of Xi, only. In particular, for 
\x\ <Tn and any prescribed < £ < |, 

' ^ ^' < ^— and \us{x)\ < - (11.5) 



with sufficiently large n. In addition, with a properly chosen sequence p„, we have 

< 7- (11-6) 



3 2 |EilM_i| < 

Now, for < T„ 



Hence, by Lemma 10.2, — 1| < | on the interval \x\ < Tn- 



{l + Us{x)) ^ - (l + Us{x)+ ^" ) = , . V 

and we obtain from (11.2) 

1 _ 1 ^ 

p„(a;) (1 + Ms(a;))v5(a:) (1 + \x['')(p{xy 

Combining this with (11.3) and using (11.5), we will be lead to 

jp'nix) + Xp4x)f _ W,{xY A 

p^{x) -(i + ^,,(^))^(^)+Z.-'^^-^^^ 1^1-^- 



where 



~ (l + |x|-i)(^(x) ~ {l + \x\^Mxy 

Wsjx) 2 _ I e2 



[1 + \x\^'-^Mx)^ (1 + |a;|2--2)(^(x) 

1 



:i + |X|3--2)V9(X^2 n- 



Here, according to the left inequality in (11.5), the remainder terms r„i(a;) and r„2(a;) 
are uniformly bounded on [— T„, T„] by n~^^^. A similar bound also holds for r„3(a;), 
by taking into account (11.6). In addition, integrating by parts, for large n and with 
some constants (independent of n), we have 

r-T„ 



/ Mx)\dx < % r ^,e^'"dx 

J\x\<Tr, ^ Ji X^' 2 

OSn 1 Till ^ ( 1 \ 

With a similar argument, the same orelation also holds for the integral of |r„5(a:)|. 
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Thus, 

/ (Pk±5PJ!,,./ + (1,7) 

J\x\<T„ Vn J|a;|<T„ U + ^sJV^ ^-t„ " ' 

Now, by Taylor's expansion around zero, in the interval |m| < \ we have 

s-4 



~'~ ^' fc=0 

(there are no terms in the sum for s = 3). Hence, with some — 2 < ^„ < 2 

„2 



J\x\<T^ (1 + f:;^ 7b|<T„ V J\x\ 



l\x\<Tr,{i- + Us)'P J\x\<T„ V J\x\<T„ V 

At the expense of a small error, these integrals may be extended to the whole real hne. 
Indeed, for large enough n, by (11-4), we have, for A; = 0, 1, . . . , s — 4 with some common 
constant Cs 



Moreover, 



f 

J — c 



If \n 

Therefore, 

s-4 



Inserting this in (11.7), we thus arrive at 

s-4 ,.+00 



fc=0 "^-"o ^ " 

In the next step, we develop this representation by expressing Ug and Wg in terms of 
Qk while expanding the sum in (11.8) in powers of as 

s-2 



j=2 

More precisely, here the coefficients are given by 

with summation over all positive solutions {ri, . . . ,rk) to ri + • • • + = j. Moreover, 
when j are odd, the above integrals are vanishing. Indeed, differentiating the equality 
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(10.2) which defines the functions qk and using the property H'^{x) = nHn-i{x) (n > 1), 
we obtain a similar equahty 



q'kix) + XQkix) = ^{x) + 21) Hk+2l-l{x) —r^ r ( 



73 \ / 7fc+2 ^ 



3!; ■■■V(^ + 2)! 



:ii.io) 

with summation over all non-negative solutions (ri, .... r^) to ri+2r2+- • ■+krk = k, and 
where / = ri + • • • + r^. Hence, the integrand in (11.9) represents a linear combination 
of the functions of the form 

Note that here the sum of indices is mod 2 the same as j. We can now apply the following 
property of the Chebyshev-Hermite polynomials (see Szego 1967). If the sum of indices 
di, . . . ,dk is odd, then necessarily 

Hd^{x) . . . Hd^{x) ip{x) dx = 0. 

Hence, aj — 0, whenever j is odd, and putting Cj — a^j, we arrive at the assertion of 
the lemma. 

Remark. In formula (11.9) with Cj = a2j we perform summation over all integers 
ri > 1 such that ri + • • • + = 2j. Hence, all ri < 2j — 1, and thus the functions 
are determined by the cumulants up to order 2j + 1. Hence, Cj represents a polynomial 
in 73, ...,72j+i. 



12. Moderate deviations 

We now consider the second integral 

J ^ f (K(^) + XPn{x)f 

^ J\x\>T„ Pn{x) 

participating in the Fisher information distance I{Zn\\Z). 

Lemma 12.1. Let s > 3 be an integer. If I{Zno) < +oo, for some Uq, then 

■^1 " ^(n(*-2)/2(iogn)(^-3)/2)- 

Proof. Write 

Ji < 2Ji,i + 2Ji,2 = 2 [ Pji!^dx + 2 [ x^pnix)dx. (12.1) 

J\x\>T„ Pn[X) J\x\>T„ 
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Using Lemma 10.1, we conclude that, for s = 3, . . . , 

"^^''^K(nlogn)M/0- ^^^-^^ 
Indeed, integrating by parts we have 

/ x'^pnix) dx = (1 - F„(TJ) + 2 / x{l- F„{x)) dx. 

Recalling the definition (10.3) of the approximating functions and applying an ele- 
mentary inequality 1 — < ^ (/^(.t) (,x > 0), we obtain from (10.4) 

T2(l-F„(r„)) = T2(l-$,(T„))+T„2($,(TJ-F„(rj) 

s-2 



yj,(*-2)/2 

fe=l 



= 01 ^ 



(nlogn)(*~2)/2 
with some constant C. In addition. 



/•+00 £ 2 ^ /"+00 

/ x{\-Fr,{x))dx < 1-$(T„) + Cj]-^ / x^Vla^)^?^ 

" '^(fnloenV^-2)/2)' 



^ V y ^ V (n log n) (^-2)/2 , 

With similar estimates for the half-axis x < — T„, we arrive at the relation (12.2). 

Let us now estimate Ji,i. Denote by the part of this integral corresponding to 
the interval x > Tn- By Propositions 6.2, 6.4 and 8.3, for sufficiently large n one may 
integrate by parts to justify the formula 

r+oo 

J+ = -PniTn) logPn(Tn) - / p'^^ix) logp„(x) dx. (12.3) 



Since J5„(a;) < C ^ liZ^o) for all a; (Propositions 2.2 and 9.1) and since Pn{Tn) > ip{Tn), 
we see that for all sufficiently large n, \ logp„(T„)| < cT^ with some constants C and c. 
Therefore, by Lemma 10.2 for the derivative of the density p„, we get 

|p;(T„)logp„(T„)| < cT^'Ip'JT^I 

= < r.-3,U)/2 )- (12-4) 
A similar relation holds at the point — Tn, as well. 
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It remains to evaluate the integral in (12.3). First we integrate over the set A = {x > 
Tn '■ Pn{^) < fi^)^}- By the upper bound of Proposition 6.4 and applying Proposition 
9.1 once more, we have, for all x and all sufficiently large n, with some constant C 

IK(X)| < HPnf'Vp^ < Cl{Zr^,fl'^^). 

Hence, with some constants c, c' 

(x) log (x) I (ia; < c / \fvJx) | log p„ (x) | 



A 



hoo -1 
\2 - I ^ 



p+oo 

< cj x^^p{xY dx — o 



On the other hand, for the complementary set B — +oo) \ A, we have 

/ \p'^{x) log pn{x)\dx < c x^\p'^{x)\dx. (12.5) 
Jb jb 

We now apply Lemma 10.2 to approximate the second derivative. It yields 

X \p^{x)\dx < ^ \cpAx)\dx + ____d2:-o(^^^^j:^j. 

Here, the first integral on the right-hand side is bounded by 

/ X^\ip'^{x) -ip"{x)\dx+ X^\X^ -l\ip{x)dx = o( (,-2)/2 )- 

Jt„ Jt„ ^J-n " ^ 



To estimate the second integral, we use Cauchy's inequality, which gives 

\ 1/2 

< — : 

-5/2 ■ 



Therefore, returning to (12.5), we get 

jK{x)\0gPn{x)\dx = ^( ^(.-2)/2(4^)(.-3)/2 )- 

Together with the bound for the integral over the set A, we thus have 

■^itl = ^(^(.-2)/2 (logn)(^-3)/2)' 

The part of the integral Ji^i taken over the axis x < —T^ admits a similar bound, 
hence the lemma is proved. 

The statement of Theorem 1.1 in case s > 3 thus follows from Lemmas 11.1 and 12.1. 
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13. Theorem 1.1 in the case s = 2 and Corollary 1.2 

In the most general case s = 2 the proof of Theorem 1.1 does no need Edgeworth-type 
expansions. With tools developed in the previous sections the argument is straightfor- 
ward and may be viewed as an alternative approach to Barron- Johnson's theorem. 

To give more details, recall that once the Fisher information /(^no) is finite, the 
normalized sums Zn with n > no + 1 have uniformly bounded densities p„ with bounded 
continuous derivatives (Proposition 6.2). Moreover, we have a well-known local limit 
theorem for densities; we described one of its variants in Lemma 10.2. In particular, 

SUv{l + x')\pn{x)-^{x)\ = 0(1), (13.1) 

X 

suv{l + x')\p'^{x)-^'{x)\ = o(l), (13.2) 

X 

as n ^ OO, where the convergence of the derivatives relies upon the finiteness of the 
Fisher information. 

Splitting the integration in 

into the two regions, we have therefore, for every fixed T > 1, 

T / {P'n{x) + XPn{x)f 

Jq = — = o(l), n — >■ oo. (13.3) 

J\x\<T Pn{x) 

On the other hand, write as we did before 

Jl = / ^^^^^^^l±^^^dx<2J,, + 2J,, 

J\x\>T Pn{x) 

^ 2 I PlM^dx + 2 [ x^pn{x)dx. 

J\x\>T Pn{x) J\x\>T 

As we saw in (12.3), 

Ji,i = -p'^{T) iogp„(r) +p'^{-T) iogp„(-r) - / p';^{x) iogp„(x) dx. 

J\x\>T 

By (13.1)-(13.2), \p'^i±T) logp„(±T)| < 2T^e-^^/^ for aU sufficiently large n > ht- By 
Proposition 8.3, with some constant c, for all x, 

I /// M / M / log(e+ |x|) 

\p„{x) log Pn{x)\ < C , 

J- ~|~ Jj 

implying 



/ \p';^{x)logPr^{x)\dx < C'T-'/^ 
J\x\>T 
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with some other constant c'. In addition, by (13.1), 



/ x'^Pn{x) dx — / x"^ {pn{x) — (p{x)) dx + / x'^ip{x) dx 

J\x\>T J\x\>T J\x\>T 

— — x^{pn{x) — <f{x)) dx + x'^(p{x) dx 

J\x\<T J\x\>T 

< / x"^ \pn{x) - ip{x)\ dx + / x^ip{x)dx < 2T^o{l)+4:Tip{T). 

J\x\<T J\x\>T 

Hence, given e > 0, one can choose T such that Ji < e, for all n large enough. This 
means that Ji = o(l), and recalling (13.3), we get I{Zn\\Z) — o(l). 
Let us now return to the case s > 3. 

Proof of CoroUeiry 1.2. According to the expansion (11.8) which appeared in the 
proof of Lemma 11.1, Theorem 1.1 may equivalently be formulated as 

I{ZJ\Z) = y"(-l)^ / Ws{xfus{x)^-^ + o{ , ,,,, y (13.4) 

J-^ '^(^) Vn(^-2)/2(logn)M/2/' ^ ' 

where as before 

s-2 s-2 , X 

w,{x) = ^(g;.(a;) +a;g,(a;))n-^/2, u,{x) = J] ^^n-^^^. 
j=i j=i ^^^^ 

This representation for the Fisher information distance is more convenient for ap- 
plications such as Corollary 1.2 in comparison with (1.3). Assume that s > 4 and 
73 = ■ • ■ = 7fc_i = for a given integer 3 < A; < s (with no restriction when k — 3). 
Then, by the definition (10.2), qi = ■ ■ ■ = Qk-s = 0, so 



s-2 s-2 

wax' 



) = E {Qjix)+xq,{x))n-^/\ Us{x) = ^ (^^-S) 

j=k-2 j=k-2 ' 

Hence, in order to isolate the leading term in (1.3) with the smallest power of 1/n, one 
should take Z = in (13.4) and j — k — 2'm the first sum of (13.5). This gives 



/ + 00 
{(^^_^{x)+xqk-2{x)) 
-oo 



dx 



(p{x) 

+ 0(n-^''-^^) +o( , — uT^y 
Now, again according to (10.2), or as found in (11.10), 

(lk-2{x) + xqk-2{x) = ^k-i{x) ip{x). 



Fisher Information 43 

Therefore, the sum in (1.3) will contain powers of 1/n starting from with leading 

coefficient 



Ck-2 = 2 J Hk_i{xf ip{x) dx 

Thus, Ci — ■ • • — Cfe_3 = and we get 



(*-l)! 

14. Extensions to non-integer s. Lower bounds 

If s > 2 is not necessary integer, put m = [s] (integer part). Theorem 1.1 admits the 
following generalization. As before, let the normahzcd sums 



n 



be defined for independent identically distributed random variables with mean EXi = 
and variance Var(Xi) — 1. 

Theorem 14.1. If I{Zno) < +oo for some Uq, and E \Xi\^ < +oo {s > 2), then 

I{Zn\\Z) = ;J + + • • • + J(.-2)/2] + ^(^(.-2)/2 (logn)M/2)' ^^^'^^ 

where the coefficients Cj are the same as in (1.4). 

The proof is based on a certain extension and refinement of the local limit theorem 
described in Lemma 10.2. 

Lemma 14.2. Assume that I{Zna) < +oo for some uq, and E < +oo (s > 2). 
Fix / = 0, 1, . . . Then for all n large enough, have densities Pn of class satisfying, 
as n ^ oo, 

(1 + \xr) {P^\x) - (^«(x)) = o(n-(-^)/^) (14.2) 

uniformly for all x, with sup^. |'0/_„(a:)| < 1 and ipi^ni^Y dx < 1. Moreover, uni- 
formly for all X, 

(l + |a:r)(p«(x)-(^«(x)) = V'^,n,i(x)o(n-(-^)/^) 

+ (1 + l^r-) ^|;l,n,2{x) (0(n-(-^)/2) + o(n-(^-^))) , (14.3) 

where sup^, |^i,„,i(a;)| < 1 and J^^ tjji^njixf dx < 1 (j = 1,2). 

Here we use the approximating functions (fm — <f + YlT=i Ik as before. 
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When / = and in a simpler form, namely, with ipi^s,j{x,n) = 1, this result has 
recently been obtained in [B-C-Gl]. In this case, the finiteness of the Fisher information 
may be relaxed to the boundcdness of the densities. The more general case involving 
derivatives can be carried out by a similar analysis as that developed in [B-C-Gl], so 
we omit details. 

If s = m is integer, the Edgeworth-type expansions (14.2) and (14.3) coincide, and 
we are reduced to the statement of Lemma 10.2. However, if s > m, (14.3) gives an 
improvement over (14.2) on relatively large intervals such as < T„ considered in 
Theorem 1.1 and defined in (11.1). 

Proof of Theorem 14.1. With a few modifications one can argue in the same way 
as we did in the proof of Theorem 1.1. First, in case / = (14.3) yields, uniformly in 

\x\ < T„ 

Pn{x) = <^„(a;) + — 3— o(n-(^-2)/2), 
i + \x\^ 

which being combined with a similar relation for the derivative {I — 1) yields 

i + \x\^ 

where Wm{x) — YlT=i i^'ki^) + xqk{x)) n"'^^^ . These two relations thus extend (11.2) 
and (11.3) which were only needed in the proof of Lemma 11.1. Repeating the same 
arguments using the functions Umix) = , we can extend the expansion of 

Lemma 11.1 with the same remainder term to general values s > 2. 

In order to prove Lemma 12.1 with real s > 2, let us return to (12.1). The fact that 
the relation (12.2) extends to non-integer s follows from the extended variant of Lemma 
10.1, which was already mentioned before. Thus our main concern has to be the integral 
Ji^i which is responsible for the most essential contribution in the resulting remainder 
term. Thus, consider the part of this integral on the positive half-axis 

Jti= / (ix = -p;(Tj logp„(T„) - / p:ix)\0gpr,ix)dx. (14.4) 

Applying (14.3) at a; = T„, we obtain (12.4) for real s > 2, that is, 

|pUrn)logp„(T„)| = o(;^(jz,y7^^^^)- 

To prove (14.1), it remains to estimate the last integral in (14.4) which has to be 
treated with an extra care. The argument uses both (14.2) and (14.3) which are applied 
on different parts of the half-axis x > T^. For the set A = {x >Tn : Pn{x) < <^(a;)^} we 
have already obtained a general relation 

\p'^{x) log Pn{x)\dx = o(^^), 



L 



Fisher Information 45 



which holds for all sufficently large n (without any moment assumption). Hence, with 
some constant c 

K(x)logp„(x)|dx < c / x''Mx)\dx + oi^^y (14.5) 

Now, on the interval [T„, 4T^] we apply Lemma 14.2 with Z = 2 to approximate the 
second derivative. It yields 



r^T^ 1 

-1- + \^ 

Here, as in the proof of Lemma 12.1, the first integral on the right-hand side is bounded, 
up to a constant, by 

+00 I 

4 ' ^ • ' 



X (p{x) dx = o 



rf-%(^-2)/2y' 



and for the second one, we use Cauchy's inequality to estimate it by Tn Similarly, 
the last integral is bounded by 

+00 \ 1/2 



J ^ i^2,nA^rdxj < 2Zl 



Since has a logarithmic growth, we conclude that 

1 ^''^"^^^'^^ = ^l n(-2)/2(logn)(-3)/2 j' 

so a similar bound also holds for the left integral in (14.5). 

To deal with the remaining values of x, we will consider the set = {x > 4T^ : 
Pn{x) < I e~^^ } and its complement 5*2 = (47"^, +00) \ 5"!. By Proposition 6.3, for all 
sufficiently large n, and with some constants c, c' we have 

/ \p'^{x)l0gPn{x)\dx < C \/pn{x) \logPn{x)\dx 
JSi JSi 

< c' / V^e-^^dx = o( -). 

J AT* \n' 
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On the other hand, applying (14.2) on the set 5*2, we get 
/ \p'^{x)\ogp„{x)\dx\ < c \p'^{x)\^/x dx 

J S2 J S2 

< c' f x^/^(p{x)dx + c' [ J^^i/9 - of i\^/o ] 

Combining the two estimates, the theorem is proved. 

Remark 14.3. If 2 < s < 4, the expansion (14.1) becomes 

A^.II^) = °( „,._.V.(4n)..-3)/0 ' 

This formulation does not include the case s = 2. In case s > 2, we expect that the 
bound (14.6) may be improved further. However, a possible improvement may concern 
the power of the logarithmic term, only. This can be illustrated by means of the example 
of densities of the form 

r+oo 

p{x)= (p^{x)dP{a) (xeR), 

J CTO 

that is, mixtures of densities of normal distributions on the line with mean zero, where 
P is a (mixing) probability measure supported on the half-axis (ctq, +00) with ao > 0. 
A natural variance constraint on P is that 

/+CXD /■ + 0O 

x'^p{x) dx^ a'^dP{a) = 1, (14.7) 
■00 J ao 

SO we should assume that < o"o < 1- 

First, let us note that, by the convexity of the Fisher information, 

lip) < / liip,) dP{a) = / - dP{a) < -, 

hence, /(p) is finite. On the other hand, given 77 > s/2, it is possible to construct the 
measure P to satisfy (14.7) and with 

D{Z^\\Z) > -, 

for all n large enough, and with a constant c depending on s and r], only (cf. [B-C-G2]). 
For example, one may define P on the half-axis [2, -|-oo) by its density 

dP(a) c 



da (7^+1 (log ct)^' 



(7 > 2, 
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and then extend it to any interval [do, 2] in an arbitrary way so that to obtain a probabil- 
ity measure satisfying the requirement (14.7). Hence, (14.6) is sharp up to a logarithmic 
factor. 

Finally, let us mention that in case s = 2, D{Zn\\Z) and therefore I{Zn\\Z) may 
decay at an arbitrary slow rate. 
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